Search CORE

57 research outputs found

Implementing Database Coordination in P2P Networks

Author: Giunchiglia Fausto
Zaihrayeu Ilya
Publication venue
Publication date: 01/11/2003
Field of study

We are interested in the interaction of databases in Peer-to-Peer (P2P) networks. In this paper we propose a new solution for P2P databases, that we call database coordination. We see coordination as managing semantic interdependencies among databases at runtime. We propose a data coordination model where the notions of Interest Groups and Acquaintances play the most crucial role. Interest groups support the formation of peers according to data models they have in common; and acquaintances allow for peers inter-operation. Finally, we present an architecture supporting database coordination and show how it is implemented on top of JXTA

CiteSeerX

Unitn-eprints Research

Lightweight Ontologies

Author: Giunchiglia Fausto
Zaihrayeu Ilya
Publication venue
Publication date: 01/10/2007
Field of study

Ontologies are explicit specifications of conceptualizations. They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts. The notion of concept is understood as defined in Knowledge Representation, i.e., as a set of objects or individuals. This set is called the concept extension or the concept interpretation. Concepts are often lexically defined, i.e., they have natural language names which are used to describe the concept extensions (e.g., concept mother denotes the set of all female parents). Therefore, when ontologies are visualized, their nodes are often shown with corresponding natural language concept names. The backbone structure of the ontology graph is a taxonomy in which the relations are “is-a”, whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like “part-of”, “located-in”, “is-parent-of”, and many others

Unitn-eprints Research

Encoding Classifications as Lightweight Ontologies

Author: Giunchiglia Fausto
Marchese Maurizio
Zaihrayeu Ilya
Publication venue
Publication date: 01/03/2006
Field of study

Classifications have been used for centuries with the goal of cataloguing and searching large sets of objects. In the early days it was mainly books; lately it has also become Web pages, pictures and any kind of electronic information items. Classifications describe their contents using natural language labels, which has proved very effective in manual classification. However natural language labels show their limitations when one tries to automate the process, as they make it very hard to reason about classifications and their contents. In this paper we introduce the novel notion of Formal Classification, as a graph structure where labels are written in a propositional concept language. Formal Classifications turn out to be some form of lightweight ontologies. This, in turn, allows us to reason about them, to associate to each node a normal form formula which univocally describes its contents, and to reduce document classification to reasoning about subsumption

Unitn-eprints Research

Query Answering in Peer-to-Peer Database Networks

Author: Zaihrayeu Ilya
Publication venue
Publication date: 01/03/2003
Field of study

Peer-to-Peer (P2P) received significant attention from both industry and academia as a version of distributed computing lying between traditional distributed systems and the web. P2P found lots of its application in file sharing systems, distributed computations, instant messaging systems and so on. Database community investigates P2P as a new decentralized paradigm for distributed management of data, that gives sound advantages over existing database systems, and also raises some challenges for thorough research in this area. The main focuses of the current PhD thesis are the study of the optimal balance of centralization vs. decentralization levels in a P2P database network necessary for e±cient query answering; development of logical and physical architecture to support P2P databases; development of a query answering algorithm and a formal theory that would describe P2P databases as well as query answering and its quality; development of a prototype and conducting of the performance study in different initial settings

Unitn-eprints Research

Coordinating Mobile Databases: A System Demonstration

Author: Giunchiglia Fausto
Zaihrayeu Ilya
Publication venue
Publication date: 01/05/2004
Field of study

In this paper we present the Peer Database Management System (PDBMS). This system runs on top of the standard database management system, and it allows it to connect its database with other (peer) databases on the network. A particularity of our solution is that PDBMS allows for conventional database technology to be effectively operational in mobile settings. We think of database mobility as a database network, where databases appear and disappear spontaneously and their network access points may change, and are not known a priori. There is a further request (and proposed PDBMS satisfies it) that databases must know, independently of their network access points, how to locate other databases, and how to interoperate with them on servicing user requests (i.e., queries and updates). PDBMS is implemented on top of the Peer-to-Peer platform JXTA [1]. Peer-to-Peer (P2P) is a decentralized networking model where each party (called a node or a peer) has equivalent abilities in providing other parties with data and/or services. Peers are largely autonomous from other peers, and they interoperate in a local, point-to-point manner. All these notions are crucial from the point of view of mobility databases may come and go, interact with different databases at different times or for answering different queries, the size of the network can dynamically shrink and expand depending on how many nodes are online, and databases can benefit from collaboration with one other by coordinating their data at runtime. JXTA helps us to implement mobility by providing an IPindependent naming space to address nodes; it is system, and networking platform independent. This allows PDBMS to be completely portable and, therefore, “pluggable” on top of multiple host platforms. Moreover, the proposed software solution is a self-contained application that can be fit on a small capacity storage device as a flash drive, which can be easily handled around. Each peer on the network provides a source database described by a (source) schema, or supplies only the schema. In this latter case a node acts as a kind of mediator in transitive propagation of data. Peers define semantic data dependency links between their schemas and use these links to coordinate data, i.e., answer input queries, propagate query results and updates. Input queries in the system are formulated w.r.t. the source schemas of single nodes. Peers are largely autonomous, in particular in what data they store, in which nodes they establish semantic data dependency links with and coordinate their data, etc. PDBMS implements a fully decentralized data coordination model [2]. The four notions at the core of our model are Interest Groups, Acquaintances, Correspondence Rules, and Coordination Rules. The first notion allows for a global aggregation of nodes carrying similar information, while the second allows for a local logical point-to-point data exchange between databases. The acquaintance is not a symmetric notion, i.e. the fact that a node is acquainted with another node does not necessarily mean that the vice versa also holds. A node is an acquainted node for some other node if the latter is an acquaintance of the former. Acquaintances are associated with a set of acquaintance queries, which are used to import data from acquaintances’ databases. An acquaintance query is the minimal block for building semantic data dependency links between peer databases. An acquaintance query is a conjunctive query [3], which head refers to some relation at a node, and its body is a query over the relations of a node’s acquaintance. Correspondence Rules solve the heterogeneity problem at the instance level, namely they specify mappings between objects of the domains of the two nodes’ databases. Finally, Coordination Rules are responsible for data coordination with acquaintances and acquainted nodes. The data coordination model is implemented inside a concrete logical architecture, see Figure 1 (first level) and Figure 2 (second level). A node consists of PDBMS, a Source Database (SDB) and a Source Schema (SS). SS describes a shared part of SDB. PDBMS consists of User Interface (UI), Database Manager (DBM), JXTA Layer and Wrapper. DBM implements the four basic notions described above. JXTA Layer is responsible for all node’s activities on the network, such as discovering of new nodes and interest groups, joining and leaving groups, sending and receiving queries and query results, and so on. Wrapper manages connections to SDB, it is responsible for extraction and maintenance of the source schema. Since different databases may require different database drivers, this module is adjustable depending on the underlying database. On the second level architecture we “open” the DBM and JXTA Layer. Rectangles with rounded corners stand for data repositories which store various information. Normal rectangles represent executive modules. The meaning of arrows between UI, DBM, JXTA Layer and Wrapper is the same as in Figure 1, namely, they represent procedure calls. Consider the JXTA Layer. The advertisements repository stores all discovered and locally created JXTA advertisements (see [2] for details on JXTA advertisements). Inside the rectangle, three advertisement types are represented, although in practice there are also others. The peer advertisement includes the source schema information. The Services module implements the core JXTA services (see [2] for details on JXTA core services) and DB-related services (i.e., the services required to run peers without databases). Consider now DBM. The P2P Management module allows users to control other modules and repositories from both the DBM and JXTA Layer. For instance, it makes it possible to create a new communication link (called pipe), to make a new acquaintance or to modify a coordination rule. The control lines are shown as thick arrows from P2P Management to other components. Query Planner processes all input queries. It uses acquaintance queries, acquaintances and interest groups information in order to detect groups and nodes for propagation. The Query Propagation (QP) module takes this information as input and uses correspondence rules for query rewriting. Finally, it uses pipes to send translated queries to acquaintances. When necessary, QP submits queries to the source database. Results Handler receives results coming from acquaintances and translates them using Correspondence Rules. If these results are for a user query, then it reports them to UI. Otherwise, it sends them backward to the node which sent respective network query. Apart from this, Results Handler gets results coming from Wrapper, and sends them either to UI or to the network. Finally, Update Handler provides all functionality necessary for updates processing. In order to facilitate performance study experiments, we provide some peer (called super-peer) with some additional functionalities. In particular, that peer can read acquaintance queries for all peers from a file and broadcast this file to all peers on the network. Once received this file, each peer looks for relevant for that peer acquaintance queries, reads them, and creates necessary pipe connections. If an acquaintance queries file is received when a peer has already set up acquaintance queries and pipes, then it drops “old” queries and pipes, and creates new ones, where necessary. Thus, a super-peer can change the network topology at runtime. This is extremely convenient for running multiple experiments on different topologies. For the purposes of collecting experimental data, each node has an additional statistical module (not shown on Figure 1). This module accumulates various information about queries (and updates) such as: total execution time of a query, number of query result messages received per acquaintance query and the volume of the data in each message, and so on. During the lifetime of a network, each node accumulates this information. A super-peer has the possibility to collect, at any given time, statistical information from all nodes on the network. Then, the super-peer processes all incoming statistical messages, aggregates them, and creates a final statistical report. The current version of the PDBMS implements Acquaintances and Coordination Rules, and partially implements Interest Groups (only one base interest group is supported) and Correspondence Rules. Amongst other things, the prototype is capable in: discovering nodes and publishing node’s resources on the network; remotely monitor other nodes (e.g. check whether their pipe connections are ready); send queries to acquaintances, receive and reconcile incoming query results; discover network topology defined by paths of interdependent acquaintance queries; execute global update procedure on the network [4]. The prototype is implemented in Java and is about 6 Mbytes in size including the JXTA libraries and excluding all metadata files (e.g. source schemas, JXTA advertisements, etc). The Java Virtual Machine environment (about 40 Mbytes) is required to run the application. Thus a self-contained application package can fit in space of about 46 Mbytes, which can be placed on a flash drive. The results of the first experiments show reasonable query answering and update propagation times in small size networks (up to 20 nodes). For the experiments we created various source databases with several thousand of tuples at each node, with different degrees of the overlapping of data at different nodes. The combination of database and P2P technologies has already received a lot of attention, see for instance [5], [6], [7], [8]. Among many other things (see [9], [2] for a detailed discussion of the related work) our solution considers a new dimension for P2P databases – mobility, where PDBMS, database, or both, can be mobile ([9], in particular, provides the vision of our approach)

Unitn-eprints Research

Coordinating Mobile Databases

Author: Giunchiglia Fausto
Zaihrayeu Ilya
Publication venue
Publication date: 01/01/2004
Field of study

We are interested in the development of a database management layer which is completely portable and, therefore, “pluggable” on top of multiple host platforms. This layer, so called Peer Database Management System (PDBMS), must be able to remotely connect with its database and to connect it with other peer databases. We realize mobility by storing PDBMS on a flash drive. We realize network independence by developing a fully decentralized data coordination model. The two notions at the core of our model are Interest groups and Acquaintances. The first notion allows for a global aggregation of nodes carrying similar information, while the second allows for a local logical point-to-point data exchange between databases. The system has been developed on top of the Peer-to-Peer platform JXTA

CiteSeerX

Unitn-eprints Research

Peer-to-Peer Knowledge Management

Author: Bonifacio Matteo
Giunchiglia Fausto
Zaihrayeu Ilya
Publication venue
Publication date: 01/04/2005
Field of study

Peer-to-Peer (P2P) is a decentralized networking paradigm where autonomous parties have equivalent capabilities in providing other parties with data and/or services. On the other hand, Knowledge Management (KM) is viewed as a core capacity in order to compete in the modern social and economic environment. In the view of the emerging semantic web technologies, P2P is looking for knowledge-driven domains to better exploit its technological potential. At the same time, driven by economical and social trends, KM is questioning its centralized nature assumption and is looking for a technological paradigm in order to benefit from exploiting its distributed dimension. In this paper we discuss the state of the art and trends in both the P2P and KM fields, discuss what possible synergies can benefit integrated P2P KM solutions, and present an implemented P2P KM system

Unitn-eprints Research

A Classification of Semantic Annotation Systems

Author: Andrews Pierre
Pane Juan
Zaihrayeu Ilya
Publication venue
Publication date: 01/12/2010
Field of study

The Object-Subject-Predicate triple annotation system is now well adopted in the research community, however, it does not always speak to end-users. In fact, explaining all the complexity of semantic annotation systems to laymen can sometime be difficult. We believe that this communication can be simplified by providing a meaningful abstraction of the state of the art in semantic annotation models and thus, in this article, we describe the issue of semantic annotation and review a number of research and end-user tools in the field. Doing so, we provide a clear classification schemes of the features of annotation systems. We then show how this scheme can be used to clarify requirements of end-user use cases and thus simplify the communication between semantic annotation experts and the actual users of this technology

Unitn-eprints Research

Semantics Disambiguation in Folksonomy: a Case Study

Author: Andrews Pierre
Pane Juan
Zaihrayeu Ilya
Publication venue
Publication date: 01/12/2010
Field of study

Social annotation systems such as del.icio.us, Flickr and others have gained tremendous popularity among Web 2.0 users. One of the factors of success was the simplicity of the underlying model, which consists of a resource (e.g., a web page), a tag (e.g., a text string), and a user who annotates the resource with the tag. However, due to the syntactic nature of the underlying model, these systems have been criticised for not being able to take into account the explicit semantics implicitly encoded by the users in each tag. In this article we: a) provide a formalisation of an annotation model in which tags are based on concepts instead of being free text strings; b) describe how an existing annotation system can be converted to the proposed model; c) report on the results of such a conversion on the example of a del.icio.us dataset; and d) show how the quality of search can be improved by the semantic in the converted dataset

Unitn-eprints Research